-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: datetime rolling min/max segfaults when closed=left (#21704) #21853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @changhiskhan! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 13, 2018 at 05:02 Hours UTC |
pandas/tests/test_window.py
Outdated
@@ -3772,7 +3816,7 @@ def test_ragged_apply(self, raw): | |||
|
|||
df = self.ragged | |||
|
|||
f = lambda x: 1 | |||
def f(x): return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter doesn’t like this. You need to either revert this change or put the return on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like i accidentally had autopep8 or something on. i'll revert the change
pandas/tests/test_window.py
Outdated
index=pd.date_range('2000', periods=10)) | ||
ser[ser.index[-3:]] = np.nan | ||
result = ser.rolling('3D', min_periods=2, closed='left').min() | ||
print(ser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this print needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i'll take it out
pandas/_libs/window.pyx
Outdated
W.push_back(i) | ||
output[0] = calc_mm(minp, nobs, input[Q.front()]) | ||
close_offset = 0 | ||
else: # if right is open then the first output is always NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two spaces before comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Addressed in update.
Codecov Report
@@ Coverage Diff @@
## master #21853 +/- ##
=======================================
Coverage 91.91% 91.91%
=======================================
Files 164 164
Lines 49992 49992
=======================================
Hits 45951 45951
Misses 4041 4041
Continue to review full report at Codecov.
|
Not sure why the linter is failing on the travis-ci build. I updated my feature branch with E704 issues, and runnning ci/lint.sh locally passes just fine... |
ack, now it's reporting the travis-ci has passed. I guess it just didn't update yet after i pushed |
@jreback this looks about ready, but I don't know this part of the code that well. |
Piggy-backing off this PR for a pandas-dev question.
I'm not even sure where the "en_IL" is coming from:
|
pandas/_libs/window.pyx
Outdated
@@ -1241,49 +1242,69 @@ cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp, | |||
input, win, | |||
minp, index, closed) | |||
|
|||
output = np.empty(N, dtype=input.dtype) | |||
output = np.empty(N, dtype=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this coercing integer input to float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, for two reasons:
- because when
closed=left
the first output should be NaN. - because output elements up to minp should also be NaN.
I believe this behavior is consistent with rolling sum (coerced to float).
pandas/_libs/window.pyx
Outdated
ndarray[int64_t] starti, endi | ||
ndarray[numeric, ndim=1] output | ||
ndarray[double_t, ndim=1] output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for changing the type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
pandas/tests/test_window.py
Outdated
@@ -464,6 +464,49 @@ def test_closed(self): | |||
with pytest.raises(ValueError): | |||
df.rolling(window=3, closed='neither') | |||
|
|||
def test_closed_min_max_datetime(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you parametrize this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in update. Waiting for CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. if you can parameterize that one test by the closed parameter & test a few differernt dtypes would be great. ping on green.
actually, hold on, i think I found a problem with my fix. I'll ping again once it's ready. Thanks. |
sorry this was a little bit more involved than I thought. I ended up adding more test cases as well and separated the variable vs fixed window code into two different functions so it's a bit easier to read. |
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes these three bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is consistent with roll_sum behavior.
@jreback I think this is ready for you to take another look now. |
looks good @changhiskhan if you can just run the asvs for min/max window ops to make sure nothing has significatnly changed. @WillAyd if any comments, merge on good asv results. |
@WillAyd if you are ok, pls approve & merge |
User reported that
df.rolling(to_offset('3D'), closed='left').max()
segfaults when df has a datetime index. The bug was in PR #19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
i
is initialized toendi[0]
, which is 0 whenclosed=left
.So in the next line when it tries to set
output[i-1]
it goes out of bounds.In addition, there are 2 more bugs in the
roll_min_max
code.The second bug is that for variable size windows, the
nobs
is never updatedwhen elements leave the window. The third bug is at the end of the fixed
window where all output elements up to
minp
are initialized to 0 ifthe input is not float.
This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.
git diff upstream/master -u -- "*.py" | flake8 --diff